8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935

jsikstro · 2025-10-22T08:56:22Z

Hello,

Parallel's MutableNUMASpace is the only GC interface that uses the Thread parameter passed through the general CollectedHeap interface to tlab_capacity, tlab_used, and unsafe_max_tlab_alloc. It would be nice if Parallel's MutableNUMASpace could do without the Thread and instead find a thread-agnostic approach. By removing the need for the thread, it becomes possible to clean up the shared CollectedHeap interface, which makes it easier to read and maintain all GCs. Also, the lgrp_id that is stored in the Thread class should really have been moved to GCThreadLocalData after that concept was created, but with a thread-agnostic approach, the field can be removed entirely.

The current solution is not without problems. When a new allocation is made inside one of the LGRP spaces in MutableNUMASpace using cas_allocate(), the NUMA/LGRP id is polled and stored inside the Thread, and we only attempt to allocate on that LGRP. If allocation fails on the local LGRP, we do not try to allocate on any other (remote) LGRP(s). This fact is reflected in the TLAB accounting methods tlab_capacity, tlab_used, and unsafe_max_tlab_alloc, which only check how much memory is used, etc., for the LGRP matching the stored LGRP id in the Thread. This model breaks down when threads are allowed to migrate between different CPUs, and therefore also NUMA nodes, which might change the LGRP id.

For example, a system with two NUMA nodes gives us two LGRPs with ids 0 and 1. If a thread allocates most of its memory on LGRP 0 and then migrates to a CPU on LGRP 1, the thread will show that it allocated a significant amount of memory, but the used memory on the LGRP it is currently on could be very low. This would give a disproportionate allocation fraction. This is not a problem as the TLAB code accounts for this, but for a different reason entirely. The other way around could also be problematic. If a thread allocates very little memory on LGRP 0 and then migrates to LGRP 1, where another thread has allocated a lot of memory, the allocation fraction will be very low, when it could have a really high fraction if accounting for the used memory on its original LGRP.

A solution to both of these issues is to average the capacity, used, and available memory across all LGRPs for the TLAB accounting methods. This approach provides a more accurate and stable view of memory usage and availability, regardless of thread migration or imbalances in NUMA/LGRP allocation. However, there are trade-offs to consider. Averaging the accounting may mask allocation pressure on specific LGRPs and could result in the allocation history of TLABs being updated more/less frequently. In summary, considering the whole eden space (i.e., the entire MutableNUMASpace) instead of just a single LGRP when accounting for TLABs should give a more holistic view.

I plan to clean up the Thread* parameter to tlab_capacity, tlab_used and unsafe_max_tlab_alloc in a following RFE. Since it is a pretty significant cleanup

Testing:

No performance impact on traditional benchmarks running with -XX:+UseParallelGC -XX:+UseNUMA. Slight improvement on SPECjbb2005.
hotspot:tier1-4 on a local NUMA machine

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8370345: Parallel: Rework TLAB accounting in MutableNUMASpace (Enhancement - P4)

Reviewers

Albert Mingkun Yang (@albertnetymk - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27935/head:pull/27935
$ git checkout pull/27935

Update a local copy of the PR:
$ git checkout pull/27935
$ git pull https://git.openjdk.org/jdk.git pull/27935/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27935

View PR using the GUI difftool:
$ git pr show -t 27935

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27935.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-10-22T08:58:41Z

👋 Welcome back jsikstro! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-10-22T08:59:02Z

@jsikstro This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace

Reviewed-by: ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 72 new commits pushed to the master branch:

7bb490c: 8370318: AES-GCM vector intrinsic may read out of bounds (x86_64, AVX-512)
6f8d07a: 8368500: ContextClassLoader cannot be reset on threads in ForkJoinPool.commonPool()
91e1dcb: 8366781: Parallel: Include OS free memory in GC selection heuristics
... and 69 more: https://git.openjdk.org/jdk/compare/9a88d7f468cdd040bdf4e1ff9441dc9c66eab03e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-10-22T09:00:02Z

@jsikstro The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-10-22T09:03:51Z

Webrevs

jsikstro · 2025-10-22T09:10:06Z

/cc add hotspot-gc

openjdk · 2025-10-22T09:11:13Z

@jsikstro
The hotspot-gc label was successfully added.

src/hotspot/share/gc/parallel/mutableNUMASpace.cpp

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace

703a7dc

openjdk bot added the hotspot hotspot-dev@openjdk.org label Oct 22, 2025

openjdk bot added the rfr Pull request is ready for review label Oct 22, 2025

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Oct 22, 2025

unsafe_max_tlab_alloc must be aligned to MinObjAlignmentInBytes

c8a2182

albertnetymk reviewed Oct 27, 2025

View reviewed changes

src/hotspot/share/gc/parallel/mutableNUMASpace.cpp Outdated Show resolved Hide resolved

Comment on unsafe_max_tlab_alloc alignment

c4cd91a

albertnetymk approved these changes Oct 27, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935

jsikstro commented Oct 22, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Oct 22, 2025

Uh oh!

openjdk bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

openjdk bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

mlbridge bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

jsikstro commented Oct 22, 2025

Uh oh!

openjdk bot commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935

Are you sure you want to change the base?

8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935

Conversation

jsikstro commented Oct 22, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Oct 22, 2025

Uh oh!

openjdk bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

jsikstro commented Oct 22, 2025

Uh oh!

openjdk bot commented Oct 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

jsikstro commented Oct 22, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Oct 22, 2025 •

edited

Loading

openjdk bot commented Oct 22, 2025 •

edited

Loading

mlbridge bot commented Oct 22, 2025 •

edited

Loading